ITCH: Information-Theoretic Cluster Hierarchies

نویسندگان

  • Christian Böhm
  • Frank Fiedler
  • Annahita Oswald
  • Claudia Plant
  • Bianca Wackersreuther
  • Peter Wackersreuther
چکیده

Hierarchical clustering methods are widely used in various scientific domains such as molecular biology, medicine, economy, etc. Despite the maturity of the research field of hierarchical clustering, we have identified the following four goals which are not yet fully satisfied by previous methods: First, to guide the hierarchical clustering algorithm to identify only meaningful and valid clusters. Second, to represent each cluster in the hierarchy by an intuitive description with e.g. a probability density function. Third, to consistently handle outliers. And finally, to avoid difficult parameter settings. With ITCH, we propose a novel clustering method that is built on a hierarchical variant of the information-theoretic principle of Minimum Description Length (MDL), referred to as hMDL. Interpreting the hierarchical cluster structure as a statistical model of the data set, it can be used for effective data compression by Huffman coding. Thus, the achievable compression rate induces a natural objective function for clustering, which automatically satisfies all four above mentioned goals.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Information Clustering by Means of Topologically Embedded Graphs

We introduce a graph-theoretic approach to extract clusters and hierarchies in complex data-sets in an unsupervised and deterministic manner, without the use of any prior information. This is achieved by building topologically embedded networks containing the subset of most significant links and analyzing the network structure. For a planar embedding, this method provides both the intra-cluster...

متن کامل

Nonlinear Schrödinger Equations for Identical Particles and the Separation Property

We investigate the separation property for hierarchies of Schrödinger operators for identical particles. We show that such hierarchies of translation invariant second order differential operators are necessarily linear. A weakened form of the separation property, related to a strong form of cluster decomposition, allows for homogeneous hierarchies of nonlinear differential operators. Some conne...

متن کامل

Hierarchical Information-theoretic Co-clustering for High Dimensional Data

Hierarchical clustering is an important technique for hierarchical data exploration applications. However, most existing hierarchial methods are based on traditional one-side clustering, which is not effective for handling high dimensional data. In this paper, we develop a partitional hierarchical co-clustering framework and propose a Hierarchical Information-Theoretical Co-Clustering (HITCC) a...

متن کامل

An information theoretic approach to hierarchical clustering combination

In Hierarchical Clustering, a set of patterns are partitioned into a sequence of groups represented as a dendrogram. The dendrogram is a tree representation where each node is associated with merging of two (or more) partitions and hence each partition is nested into the next partition. Hierarchical representation has properties that are useful for visualization and interpretation of clustering...

متن کامل

GALOIS: An Order-Theoretic Approach to Conceptual Clustering

The theory of concept (or Galois) lattices provides a natural and formal setting in which to discover and represent concept hierarchies. In this paper we present a system, GALOIS, which is able to determine the concept lattice corresponding to a given set of objects. GALOIS is incremental and relatively efficient, the time complexity of each update ranging from O(n) to O(n2) where n is the numb...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010